A Novel Prioritization Technique for Solving Markov Decision Processes
نویسندگان
چکیده
We address the problem of computing an optimal value function for Markov decision processes. Since finding this function quickly and accurately requires substantial computation effort, techniques that accelerate fundamental algorithms have been a main focus of research. Among them prioritization solvers suggest solutions to the problem of ordering backup operations. Prioritization techniques for ordering the sequence of backup operations reduce the number of needed backups considerably, but involve significant overhead. This paper provides a new way to order backups, based on a mapping of states space into a metric space. Empirical evaluation verifies that our method achieves the best balance between the number of backups executed and the effort required to prioritized backups, showing order of magnitude improvement in runtime over number of benchmarks.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملTopological Orders Based Planning for Solving POMDPs
Although partially observable Markov decision processes (POMDPs) have received significant attention in past years, to date, solving problems of realistic order of magnitude remains a serious challenge. In this context, techniques that accelerate fundamental algorithms have been a main focus of research. Among them prioritized solvers suggest solutions to the problem of ordering backup operatio...
متن کاملScaling Up: Solving POMDPs through Value Based Clustering
Partially Observable Markov Decision Processes (POMDPs) provide an appropriately rich model for agents operating under partial knowledge of the environment. Since finding an optimal POMDP policy is intractable, approximation techniques have been a main focus of research, among them point-based algorithms, which scale up relatively well up to thousands of states. An important decision in a point...
متن کاملSymbolic LAO* Search for Factored Markov Decision Processes
We describe a planning algorithm that integrates two approaches to solving Markov decision processes with large state spaces. It uses state abstraction to avoid evaluating states individually. And it uses forward search from a start state, guided by an admissible heuristic, to avoid evaluating all states. These approaches are combined in a novel way that exploits symbolic model-checking techniq...
متن کاملSafe Q-Learning on Complete History Spaces
In this article, we present an idea for solving deterministic partially observable markov decision processes (POMDPs) based on a history space containing sequences of past observations and actions. A novel and sound technique for learning a Q-function on history spaces is developed and discussed. We analyze certain conditions under which a history based approach is able to learn policies compar...
متن کامل